-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Fix terms aggregation doc_count_error_upper_bound for already reduced results (batched query phase) #134645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic makes sense to me, I left one question and some nits regarding how to expose the new flag.
|
||
protected abstract AggregationReduceContext forSubAgg(AggregationBuilder sub); | ||
|
||
public boolean doesFinalReduceHaveBatchedResult() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: that feels weird since it depends on isFinalReduce
? Maybe rename it into hasBatchedResult
so that callers have to check isFinalReduce
and hasBatchedResult
?
return finalReduceHasBatchedResult; | ||
} | ||
|
||
public void setFinalReduceHasBatchedResult(boolean finalReduceHasBatchedResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be final and set in the ForFinal ctr?
// If we are reducing only one aggregation (size == 1), the doc count error should be 0. | ||
// However, the presence of a batched query result implies this is a final reduction and a partial reduction with size > 1 | ||
// has already occurred on a data node. The doc count error should not be 0 in this case. | ||
docCountError = size == 1 && reduceContext.doesFinalReduceHaveBatchedResult() == false ? 0 : sumDocCountError; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that also handle the case where the partial reduction happens on the coord node (when reaching reduce batch size)?
There's logic in AbstractInternalTerms that sets the top-level doc_count_error_upper_bound to 0 if only one aggregation is being reduced. However, now that we have batched query execution, it's possible a reduction of multiple aggregations has already occurred on a data node, and is now undergoing a final reduction on the coordinating node.
I discovered this case while attempting to remove the settings override turning off batched query execution in TermsDocCountErrorIT (testFixedDocs with -Dtests.seed=ABFC03388645940D):
I've attempted a lightweight fix here: flag the existence of a batched query result in the AggregationReduceContext, then check this flag when potentially setting doc_count_error_upper_bound to 0. If a batched query result is present, it implies multiple shards were reduced remotely (a single shard on a data node is not batched).